supernet training
Searching Efficient Semantic Segmentation Architectures via Dynamic Path Selection
Existing NAS methods for semantic segmentation typically apply uniform optimization to all candidate networks (paths) within a one-shot supernet. However, the concurrent existence of both promising and suboptimal paths often results in inefficient weight updates and gradient conflicts. This issue is particularly severe in semantic segmentation due to its complex multi-branch architectures and large search space, which further degrade the supernet's ability to accurately evaluate individual paths and identify high-quality candidates. To address this issue, we propose Dynamic Path Selection (DPS), a selective training strategy that leverages multiple performance proxies to guide path optimization. DPS follows a stagewise paradigm, where each phase emphasizes a different objective: early stages prioritize convergence, the middle stage focuses on expressiveness, and the final stage emphasizes a balanced combination of expressiveness and generalization. At each stage, paths are selected based on these criteria, concentrating optimization efforts on promising paths, thus facilitating targeted and efficient model updates. Additionally, DPS integrates a dynamic stage scheduler and a diversity-driven exploration strategy, which jointly enable adaptive stage transitions and maintain structural diversity among selected paths. Extensive experiments demonstrate that, under the same search space, DPS can discover efficient models with strong generalization and superior performance.
d072677d210ac4c03ba046120f0802ec-AuthorFeedback.pdf
We respond to the concerns point-by-point as below. Why distilling prioritized paths improves architecture rating? The more sufficient/full training of subnets leads to a more accurate architecture rating [6](Sec.4.3). The set used to train the matching network? We will revise the manuscript to make this point clearer.
APPENDIX: In this section, we provide the details of our implementation and proofs for reproducibility
's hidden state by h Then we need to calculate the second part of Eq. Using the Bayes' theorem, we have: p In Section 4.3, we devise a Sigmoid function to adapt the γ during the supernet training, which is defined as: γ (t) = 1 Sigmoidnull ( t total epochs 2 1) b null, (19) Section 3.2 theoretically demonstrates the benefit of the proposed architecture complementation loss function,